Concerning the deficiency in scalability of the traditional hierarchical clustering algorithm when dealing with large-scale text, a parallel hierarchical clustering algorithm based on the MapReduce programming model was proposed. The vertical data partitioning algorithm based on the statistical characteristic of the components group of text vector was developed for data partitioning in MapReduce. Additionally, the sorting characteristics of the MapReduce were applied to select the merge points, making the algorithm be more efficient and conducive to improve clustering accuracy. The experimental results show that the proposed algorithm is effective and has good scalability.